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LiDAR waveform data from airborne LiDAR scanners (ALS) e.g. the Land Vegetation and Ice Sensor (LVIS) have 
been successfully used for estimation of forest height and biomass at local scales and have become the preferred 
remote sensing dataset. However, regional and global applications are limited by the cost of the airborne LiDAR 
data acquisition and there are no available spaceborne LiDAR systems. Some researchers have demonstrated the 
potential for mapping forest height using aerial or spaceborne stereo imagery with very high spatial resolutions. 
For stereo images with global coverage but coarse resolution new analysis methods need to be used. Unlike most 
research based on digital surface models, this study concentrated on analyzing the features of point cloud data 
generated from stereo imagery. The synthesizing of point cloud data from multi-view stereo imagery increased 
the point density of the data. The point cloud data over forested areas were analyzed and compared to small foot- 
print LiDAR data and large-footprint LiDAR waveform data. The results showed that the synthesized point cloud 
data from ALOS/PRISM triplets produce vertical distributions similar to LiDAR data and detected the vertical 
structure of sparse and non-closed forests at 30 m resolution. For dense forest canopies, the canopy could be cap- 
tured but the ground surface could not be seen, so surface elevations from other sources would be needed to cal- 
culate the height of the canopy. A canopy height map with 30 m pixels was produced by subtracting national 
elevation dataset (NED) from the averaged elevation of synthesized point clouds, which exhibited spatial features 
of roads, forest edges and patches. The linear regression showed that the canopy height map had a good correla- 
tion with RH50 of LVIS data with a slope of 1 .04 and R 2 of 0.74 indicating that the canopy height derived from 
PRISM triplets can be used to estimate forest biomass at 30 m resolution. 

© 2014 Elsevier Inc. All rights reserved. 


1. Introduction 

The accurate mapping of forest biomass over large areas is important 
for studies of global climate change and the carbon cycle. Recent progress 
in the estimation of forest biomass from remote sensing data is mainly 
due to the success in the extraction of related forest structure parameters. 
The data sources and methods for obtaining information on forest vertical 
structure can be summarized as the following four categories: 

( 1 ) LiDAR data is the best dataset for the estimation of forest biomass 
because it can directly measure the vertical structure of forests. 
Drake et al. (2002) explored the ability of large-footprint LiDAR 
instruments to estimate important structural attributes using 
the data acquired by the airborne Land Vegetation and Ice Sensor 
(LVIS). They reported that LVIS metrics were able to predict 
aboveground biomass with R 2 = 0.93. Sun, Ranson, Rimes, 
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Blair, and Kovacs (2008) found that LiDAR waveforms acquired 
by the spaceborne Geoscience Laser Altimeter System (GLAS) 
was highly correlated with that of LVIS with R 2 = 0.82 for Octo- 
ber 2003 GLAS data. Bortolot and Wynne (2005) developed an 
individual tree-based algorithm for determining forest biomass 
using small footprint LiDAR data. Pang, Lefsky, Miller, Sherrill, 
and Andersen (2008) explored the automatic tree crown delin- 
eation using discrete return LiDAR. Ni-Meister et al. (2010) 
assessed the application of vegetation structure parameter de- 
rived from LiDAR data for the mapping of aboveground biomass. 
Lefsky et al. (2005) estimated forest height and biomass using 
spaceborne Geoscience Laser Altimeter System (GLAS) data. 
Dubayah et al. (2010) estimated tropical forest height and bio- 
mass using LVIS. However, it is difficult to collect data at regional 
or global scales using airborne LiDAR because of its cost. GLAS ac- 
quired during the ICESat mission (2003-2009) had global cover- 
age, but only provided point sampling data. Other imagery data is 
needed to extrapolate the samples to generate large area results. 

(2) Forest structure can be extracted from the height of the scatter- 
ing phase center (HSPC) of InSAR data. This requires knowledge 
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of the elevation of the ground surface under the forest from other 
sources. Simard et al. (2006) mapped mean tree height in man- 
grove forests in the Everglades National Park using C-band 
HSPC of Shuttle Radar Topography Mission (SRTM) and Solberg 
et al. (2010) showed good results with X-band SRTM data. 
Kellndorfer et al. (2004) reported that the HSPC from SRTM 
with surface elevation from national elevation datasets (NED) 
could estimate vegetation height at Georgia and California sites 
with R 2 = 0.79 and 0.75, respectively. However, except for 
SRTM, most spaceborne InSAR data depend on repeat-pass oper- 
ation and the quality of these data may be impacted by temporal 
decorrelation (Hall et al., 2011). 

(3) The extraction of forest height through the combination of dual 
wavelength InSAR data. Neeff, Dutra, dos Santos, Freitas, and 
Araujo (2005) used the difference between digital surface 
models (DSM) from X-band and P-band InSAR data as a measure 
of vegetation height in the estimation of forest biomass. Balzter, 
Rowland, and Saich (2007) used X-band and L-band InSAR data 
acquired by E-SAR airborne sensors to estimate the top heights 
of forest stands. This research mainly utilized the dependence 
of penetration depth of SAR data on wavelength. These studies 
are only applicable at local scales due to the limited coverage of 
airborne data. 

(4) Polarimetric Interferometric SAR (PolInSAR) is a promising tech- 
nique for the estimation of forest height and biomass. PolInSAR 
employs the dependence of penetration depth of SAR on polari- 
zation (Cloude & Papathanassiou, 2003; Garestier, Dubois- 
Fernandez, & Papathanassiou, 2008; Papathanassiou & Cloude, 
2001 ). Temporal decorrelation is the main problem in the appli- 
cation of PolInSAR and limits the usefulness of the repeat pass 
InSAR data (e.g„ Hall et al., 2011). 

Besides the above four approaches, photogrammetry (or stereo im- 
agery) is another technique which is directly sensitive to the forest ver- 
tical structure. Photogrammetry is a traditional technique for the 
extraction of a digital surface model (DSM). Information about forest 
canopy height should be contained in the stereo imagery because it is 
acquired by an optical sensor that records the signal mainly reflected 
by the top surface of the forest canopy. In the past, photogrammetry uti- 
lized aerial stereo images to map the elevation of the ground surface. 
The manual selection of matching ground points in the stereo pairs sup- 
pressed information about the forest structure. The high acquisition cost 
of aerial photos, the narrow swath, the complicated mathematical equa- 
tions and the professional processing techniques of traditional photo- 
grammetry based on film images hindered its application beyond 
surveying and mapping. Therefore, photogrammetry has not been 
used extensively for estimation of forest height and biomass. 

The launch of spaceborne sensors, the application of digital cameras, 
the maturation of photogrammetry theory and the development of fully 
digital and automatic image processing make the application of photo- 
grammetric methods easier. However, these advances occurred nearly 
in the same period with the appearance of LiDAR data. Many re- 
searchers working on forest height and biomass estimation began 
using LiDAR data. Nevertheless, some researchers in the field of survey- 
ing and mapping began to investigate the extraction of forest structure 
from stereo imagery at the beginning of this century. Sheng, Gong, and 
Biging (2001) successfully reconstructed the crown surface of a red- 
wood tree from high-resolution aerial images. Gong, Mei, Biging, and 
Zhang (2002) presented a correction method for the improvement of 
the canopy boundary locations in the digital surface model (DSM) de- 
rived from high-resolution aerial images. Naesset (2002) measured 
the mean tree height of 73 forest stands in a 1000 ha forest area by au- 
tomatic stereo processing of aerial images. Korpela (2004) explored the 
plausibility of the use of multi-scale aerial photographs to conduct forest 
inventory at the individual tree level. The results showed that very high 
accuracy at the individual tree level could not be expected while the 


aggregate stand variables could be estimated very accurately. Nuske 
and Nieschulze (2004) investigated the possibility to automatically de- 
rive a stand height for a dense beech stand using stereo imagery with a 
resolution of 0.44 m. The canopy height model was derived by the dif- 
ference between a DSM from stereo imagery and existing digital eleva- 
tion model (DEM, i.e. the digital elevation model of ground surface 
under a forest canopy); St-Onge, Jumelet, Cobello, and Vega (2004) in- 
vestigated the extraction of tree height through the combinations of 
the DEM from aerial stereo imagery and that from LiDAR data. St- 
Onge, Vega, Fournier, and Hu (2008) created hybrid photo-LiDAR cano- 
py height models (CHMs) by subtracting the LiDAR DEM from the aerial 
photogrammetric DSM. The results showed that the quality of photo- 
grammetric DSM could be affected by sunlight and viewing geometry. 
Photo-LiDAR CHMs were well correlated to their LiDAR counterparts 
on a pixel-wise basis but have a lower resolution and accuracy. It also 
demonstrated that plot metrics extracted from the LiDAR and photo- 
LiDAR CHMs, such as height at the 95th percentile of 20 m x 20 m win- 
dows, were highly correlated. Toutin (2004) reported the results from 
1KONOS spaceborne stereo imagery with 0.8 m resolution. Hobi and 
Ginzler (2012) discussed the DSMs created from stereo imagery ac- 
quired by the spaceborne sensor WorldView-2 with a resolution of 
0.46 m at Nadir. 

These studies demonstrated that the stereo imagery held great po- 
tential for deriving canopy height. However, most of the current studies 
were conducted on aerial images or spaceborne images with very high 
resolutions (<1.0 m). The resolution of spaceborne stereo sensors 
with global coverage is not as high. The characteristics of DSMs could 
be directly affected by image resolution because the DSM from stereo 
imagery depends on the number and quality of the matching points 
from stereo images recognized automatically based on image textures. 
In addition, most current research in this area are mainly based on the 
digital surface model (DSM), which is an interpolation of the elevations 
of matching points. 

The point cloud (PC) data (three-dimensional locations of matching 
points) similar to that of small footprint LiDAR data can be derived from 
stereo photogrammetry processing and has not been fully examined 
over forested areas. The methods used for analysis of LiDAR point 
cloud data can be applied to the point cloud of stereo imagery. However 
the acquisition of point clouds in photogrammetry is quite different 
from LiDAR so the indices derived from cloud data may have different 
information about forest structure. 

Some spaceborne stereo sensors are composed of three independent 
optical systems for viewing nadir (N), forward (F) and backward (B) 
producing a set of stereo images along the satellite's track. In studies 
using multi-view stereo imagery, the DSMs derived from different com- 
binations of stereo pairs were often simply averaged (Leberl et al., 
2010). In fact, the different combinations of views have different infor- 
mation. For example, for a satellite in a descending polar-orbit, the for- 
ward sensor could see the north side of forest canopy (tree crowns) ; the 
backward sensor could see the south side of forest canopy while both 
sides are visible on nadir image. The point cloud from the image pair 
of forward and nadir (FN) mainly includes the points located on the 
north side of a forest canopy while that of nadir and backward (NB) 
should come from the opposite side. The point cloud from the image 
pair of forward and backward (FB) images should locate the top surface 
of a forest canopy. Therefore, the fusion of the three sets of point clouds 
could provide more information about the structure of forest canopy 
than any one of them because of the increased density of point cloud 
data. 

The Panchromatic Remote-sensing Instrument for Stereo Mapping 
(PRISM) onboard the Advanced Land Observation Satellite (ALOS) has 
acquired multiple global coverages from June 2006 to April 2011. 
China launched their third Earth resources satellite Zi Yuan 03 (ZY03) 
on January 9, 2012. ZY03 is specifically designed for the collection of ste- 
reo imagery with a resolution of 3.6 m for forward and backward views 
and 2.1 m for the nadir view similar to ALOS/PRISM. Therefore, it is 
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useful to explore the information contained in the point clouds of multi- 
view stereo imagery for regional mapping of forest canopy height and 
biomass. 

The test site and data used in this study will be described in 
Section 2. For the convenience of understanding, the basic principles 
of stereo imagery processing implemented in most software will be 
briefly reviewed in the first part of Section 3. The second part of 
Section 3 will describe the techniques used for analysis of point cloud 
data from multi-view stereo imagery. The features of point clouds 
over forested areas will be explored in Section 4 followed by discussion 
and conclusions. 

2. Test site and data 

2.3. Test site 

The test site is located in a forested area near Howland, Maine USA 
(45°08'N, 68°45'W). The stands in this northern hardwood boreal tran- 
sitional forest consist of hemlock-spruce-fir, aspen-birch, and hem- 
lock-hardwood mixtures. The common species include Populus 
tremuloides (quaking aspen), Betula papyrifera (paper birch), Tsuga 
canadensis (eastern hemlock), Picea rubens (red spruce), Abies balsamea 
(balsam fir), and Acer rubrum (red maple) (Huang et al., 2013). The re- 
gion features relatively level and gently rolling topography where the el- 
evation ranges from 40 m to 207 m within an area of 35 km x 35 km. This 
site has an Ameriflux tower situated within an intermediate aged forest 
(http://ameriflux.ornl.gov/fullsiteinfo.php?sid=54). The surrounding 
land is privately owned by a timber production company and has a his- 
tory of multiple forest management manipulations, such as clear-cut, 
select-cut and strip-cut (Huang et al., 2013). This site is currently used 
for interdisciplinary forest research and experimental forestry practices 
with extensive field and airborne data acquired since 1989. This site is 
covered by both the LiDAR waveform data acquired by the NASA LVIS 
in 2003 and 2009 respectively and the high density point cloud data of 
small footprint LiDAR acquired by NASA Goddard's LiDAR, Hyperspectral 
& Thermal Imager (G-LiHT) in June 201 2. Therefore this site is very suit- 
able to explore the features of points cloud from multi-view stereo 
imageries. 

2.2. Data 

2. 2 A. PRISM 

The ALOS/PRISM data is used in this study due to its global cover- 
age and potential for regional mapping of forest canopy height and 
biomass. PRISM is a panchromatic radiometer with a 2.5 m spatial 
resolution at nadir. PRISM has three independent optical systems 
for viewing in the nadir, forward and backward directions along 
the satellite orbit track. Forward and backward telescopes are in- 
clined + 24 and — 24° from nadir to realize a base-to-height ratio 
of 1.0. The nadir-viewing telescope covers a swath width of 70 km; 
whereas forward and backward telescopes cover a swath of 35 km. 
The wide field of view (FOV) provides three fully overlapped stereo 
(triplet) images of a 35 km width without mechanical scanning or 
yaw steering of the satellite. PRISM has 9 observation modes 
resulting from different combinations of the three views. The data 
used in this study were acquired on 09/05/2009 under the triplet ob- 
servation mode using all the three views (i.e., PRISM Mode 1 ) with a 
swath width of 35 km. it was processed to level 1B1 which is radio- 
metrically calibrated at the sensor. WGS84/EGM96 was used for hor- 
izontal and elevation reference of DSMs and point clouds, as well as 
the ground control points used in the calculation of rational polyno- 
mial coefficients (RPC). 

2.2.2. G-LiHT 

G-LiHT is a portable airborne system developed by the NASA God- 
dard Space Flight Center that simultaneously maps the composition, 


structure and function of terrestrial ecosystems (Cook et al., 2013). G- 
LiHT is composed of a scanning LiDAR, imaging spectrometer and ther- 
mal camera to fulfill the data coincidence in time and space for data fu- 
sion. The G-LiHT airborne laser scanner (VQ-480, Riegl Laser 
Measurement Systems, Horn, Austria) uses a 1550 nm laser that pro- 
vides an effective measurement rate of up to 150 kHz along a 60- 
degree swath perpendicular to the flight direction. At a nominal flying 
altitude of 335 m, each laser pulse has a footprint of - 10 cm diameter 
and is capable of producing up to 8 returns. Classified point cloud data 
and digital terrain and canopy height models are openly distributed 
on the G-LiHT webpage (http://gliht.gsfc.nasa.gov). The elevations are 
in meters referenced to the WGS84/EGM96 geoid. 

2.2.3. NED 

The national elevation dataset (NED) is the primary elevation data 
product of the Unite States Geologic Survey (USGS). This product is dis- 
tributed in geographic coordinates in units of decimal degrees. Its hori- 
zontal reference is the North American Datum of 1983 (NAD 83) and its 
vertical reference is the North American Vertical Datum of 1988 (NAVD 
88). The vertical reference of NED data were transformed to the WGS84/ 
EGM96 using GE01D12 software developed by the National Geodetic 
Survey (NGS) and the GEOTRANS software. 

2.2.4. LVIS 

NASA's LVIS is an airborne laser altimeter system designed, devel- 
oped and operated by the Laser Remote Sensing Laboratory at Goddard 
Space Flight Center. LVIS obtained sub-canopy and canopy-top topogra- 
phy data as well as canopy vertical structure information for forested 
sites at our study area in order to generate detailed forest structural 
data sets. The LVIS data used in this study were acquired in August, 
2009 using a nominal footprint size of 20 m. The LVIS data were released 
as three products, i.e. LVIS ground elevation (LGE), LVIS geolocated 
waveform (LGW) and LVIS canopy top elevation (LCE). The LGE includes 
location (latitude/longitude), ground surface elevation, and the heights 
of the energy quartiles (relative height to ground surface; RH25, RH50, 
RH75, and RH100) where 25%, 50%, 75% and 100% of the waveform ener- 
gy occur. The LGW provides the geocoded waveforms of each LiDAR shot. 
The RH50 of LVIS LGE data was rasterized to 30 m x 30 m pixels for eval- 
uating canopy height from PRISM and NED in following sections. The 
RH50 observations for footprints located within a pixel were averaged 
to be the value of the pixel. The pixels having no footprints were masked 
out in the analysis. The elevation reference of LVIS data is transformed 
from WGS-84 (i.e., GRS80) ellipsoid to the WGS84/EGM96 geoid using 
the GEOTRANS software developed by the National Geospatial- 
Intelligence Agency. 

3. Method 

3.3. The processing of stereo imagery 
3.1 A. Sensor model 

One critical component of photogrammetry is the sensor model 
which establishes the functional relationship between the two- 
dimensional image space (line, sample) and the three-dimensional ob- 
ject space (latitude, longitude, elevation) by defining the location and 
viewing direction (i.e. the imaging geometry including pitch, roll and 
yaw angles) of the camera in space (Tao & Hu, 2001 ). There are typically 
two categories of sensor models, i.e. physical and generalized models. 
Physical models are rigorous and normally yield high modeling accura- 
cy. However, a physical sensor model represents the physical imaging 
process and contains some critical sensor technical information which 
is held as confidential by some data providers, especially commercial 
satellite data vendors (Tao & Hu, 2001 ). In a generalized sensor model, 
the transformations between the image and the object space are repre- 
sented as general functions, such as polynomials or rational functions, 
without modeling the physical imaging process. The rational function 
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model (RFM) is a generalized sensor model, which uses ratios of poly- 
nomials to establish the relationship between the two-dimensional co- 
ordinates of the image of an object and the coordinates of the object in 
three-dimensional space. The physical parameters related to critical 
sensor information cannot be easily recovered from the RFM. In addi- 
tion, the generalized sensor model is not sensor-dependent so that gen- 
eral processing software can be used to process the stereo imagery from 
different platforms. Only the rational polynomial coefficients (RPC) of 
the RFM need to be updated. Therefore the RFM is attractive and used 
in the state-of-art digital photogrammetric software packages. 

In the rational function model, image pixel coordinates (r, c) are 
expressed as the ratios of polynomials of ground coordinates (X, Y, Z). 
In order to minimize the errors during the computations and improve 
the numerical stability of the equations, the two image coordinates 
and three ground coordinates are normalized to the range of — 1.0 to 
+ 1.0 (Tao & Hu, 2001 ). The RFM can be expressed as (Tao & Hu, 2001 ) : 


„_ r r o r _ c c 0 _X X 0 „ _ Y V 0 7 _Z Z 0 
r, ’ " c c " X, ’ " Y. n 4 


P i(X n .V n .Z n ) 
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p 4 (X n ,Y n ,Z n ) 
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where / = 1, 2, 3, 4. r 0 , c 0 and r s , c s are the offsets and scale values for the 
normalization of image coordinate respectively while X 0 , Y 0 , Z 0 and X s , 
Y s , Z s are respectively the offsets and scale values for the normalization 
of ground coordinates. P ( defined the transformation from X n , Y n , Z„ to 
r„, c n by the 20-term cubic form polynomial above. Therefore the RPC 
is always composed of these 90 coefficients. The RPC is provided in 
some stereo images and could be calculated using ground control points 
(GCP) by direct or iterative least squares methods as given in (Tao & Hu, 
2001 ). The two images in a stereo pair are referred to as left and right 
images respectively in traditional photogrammetry. Both images have 
their own RPCs. 

3.1.2. Stereo processing 

With the sensor model, a point in object space can easily be trans- 
formed into image space. However, the transformation from image 
space to object space needs the information of elevation. The central 
idea of photogrammetry is to determine the elevation of an object (or 
an image pixel) by measuring the displacement or difference in the ap- 
parent position of an object viewed along two different lines of sight 
(i.e., the parallax) of the matching points of the object on two stereo im- 
ages. The two matching points for an object should be identified first to 
measure their parallax. The identification of matching points is a process 
of two-dimensional image matching based on the spatial characteristics 
of the two stereo images. An important advance in digital photogram- 
metry is the introduction of epipolar geometry (or epipolar constraints). 
By the epipolar geometry for a ground point on one image, the position 
of its projection (or matching point) on the other image is not random 
within the two-dimensional image space but on a line referred to as 
the epipolar line in the image space. The ground point and its projection 
centers of two images compose a plane in three-dimensional object 
space referred to as epipolar plane. The intersection between the images 
and epipolar plane are the epipolar lines in the images. For a given point 
on one image we only need to search its matching point along its 
epipolar line on the other image. Therefore the epipolar geometry great- 
ly improves the computation efficiency by reducing the searching pro- 
cess from two-dimensional to one-dimensional space. 


The epipolar geometry further indicates that the epipolar lines will 
be parallel with each other if the inclined imaging plane is transformed 
into the horizontal plane using the imaging geometry expressed by the 
senor models. The transformation of inclined image plane to horizontal 
image plane can be expressed as 
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where (p, a> and k are the roll, pitch and yaw angles which can be cal- 
culated from the sensor model, i = 1 or 2 represents the left and right 
images respectively. /is the focal length of the camera. [x ; j/,] 7 is the co- 
ordinate of a pixel in inclined image while [x|, y|] T is the corresponding 
coordinate of the pixel in horizontal image, z) should be zero or a con- 
stant in the horizontal image. After both inclined images are trans- 
formed to their corresponding horizontal images using Eq. (2), the 
right horizontal image will be further transformed to the left horizontal 
images by a rotation and translation: 
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where 0 and [x 0 , yo] T are rotation angle and translation vector of the 
right horizontal image relative to the left. They need to be determined 
by some tie points identified from the two original stereo images by 
two-dimensional image matching or by manual selection. By applying 
Eqs. (2) and (3) with the tie points and sensor model, the two images 
of the stereo pair can be transformed into epipolar space, and are re- 
ferred to as epipolar images. The identification of matching points can 
be automatically conducted line by line between the epipolar images. 
The elevation of ground points can be calculated by the parallax of the 
identified matching points using simple geometric relations (Jan, 
1993). The point clouds are the collection of all the pixels with identified 
matching points. The digital surface model will be produced through the 
interpolation of the point clouds. 

In the above processing, the three-dimensional positions of pixels 
were calculated based on the left image while the right image only pro- 
vides the parallax information. Therefore the geometrical features of 
point clouds are the same as the left image. In this study the output of 
processing a stereo image pair includes three products: left image, 
point clouds and digital surface model. 

3.2. The synthesis of multi-stereo point clouds 

As mentioned above, ALOS/PRISM has three independent optical 
systems for viewing nadir (N), forward (F) and backward (B). Three dif- 
ferent combinations of these images (FN, FB and NB) can be used to gen- 
erate three different point cloud datasets. The synergy of these point 
clouds increases the density of the points to provide adequate informa- 
tion on the vertical structure of the ground objects. 

3.2.1. Co-registration of point clouds 

One basic prerequisite for the development of point clouds from 
different view combinations is that they must be accurately co- 
registered. Although the RFM is a good theoretical model of the im- 
aging geometry of a sensor, the RPC is not error-free because it is de- 
termined by the measurement of sensor imaging parameters or 
ground control points. The uncertainties in the imaging parameter 
measurements or the coordinates of ground control points will in- 
evitably be propagated to the RPC and finally be exhibited in the 
geocoding accuracy of stereo images, especially for high resolution 
images. An accurate co-registration method has been developed in 
our previous studies (Ni, Sun, Zhang, Guo, & He, 2014a). The 
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image to be resampled is referred to as the slave image, whereas the 
other image is referred to as the master image. It models the mis- 
registration between master and slave images caused by transla- 
tion, scale, skew, and rotation as 


ym . 


= A 


*s 

y s . 
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V 

where A= 

cos <p 

sirup 

K w 
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— sirup 

coscp 

0 l< y 
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where (x 0 , y 0 ) T is the translation vector, ( k x , k y ) are the scale factors 
along the sample and line directions, w is the skew term, and (p is the ro- 
tation angle. ( x m ,y m ) T are the center coordinates of a chip in the master 
image, whereas ( x s , y s ) T are the center coordinates of a chip in the slave 
image having maximum correlation with the master chip. The control 
points of these two images were searched automatically based on 
the correlation of image chips, and the point qualities were screened 
based on the signal-to-noise ratio. To achieve successful co- 
registration, the number of control points should be much more than 
the number of parameters to be determined. The over-determined sys- 
tem of linear equations can be solved using the least squares method. 
The left image was used as master image in the co-registration process. 
The point clouds and DSMs are then resampled using the same co- 
registration parameters for their left images. 


3.2.2. Removal of elevation divergence 

Besides the spatial displacement, the divergence between the eleva- 
tions from different view combinations could also be expected because 
of the errors associated with the RPCs. The accuracy of ground control 


points also affects the estimation of imaging geometry. The elevation var- 
iation caused by errors of these three angles would not be local ridges or 
valleys but overall surface inclination or distortion. In other words, the in- 
fluence of errors of the three angles on elevation measurements should 
have low spatial frequency. Therefore, low-pass filtering could reveal 
the divergence between the elevations from different view combinations. 
A method has been proposed in our previous studies for the removal of 
atmospheric affect on the phase image of interferometric synthetic aper- 
ture radar which also showed the feature of low spatial frequency (Ni, 
Sun, Zhang, Guo, & He, 2014b). An essential step in low-pass filtering is 
the determination of the window size. If the window size is too small, 
most high spatial frequency details will remain in the filtered results. As 
the window size increases, the high frequency details in the filtered 
image will decrease. When the window size is large enough, the filtered 
image contains only the low frequency features. The changes in the stan- 
dard deviation of the filtered image will become more stable as the win- 
dow size increases. The window size of the filter was selected based on 
the where the standard deviation stabilized. The low-frequency features 
were then removed from these PC data and their DSMs. 

3.2.3. Rectification to other source DEM 

The results from different view combinations should have been ac- 
curately co-registered with each other after the two-steps described 
above. Both FB and NB have been co-registered to FN and the elevation 
divergences have been removed. Therefore the three sets of point 
clouds could be merged to produce the synthesized point cloud. The el- 
evation data from other-sources, such as National Elevation Dataset 
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Fig. 1 . The spatial coverage of data used in this study: blue, red and yellow squares — forward, nadir, and backward images of PRISM, respectively, white — LVIS data, green — G-LiHT data. 
The background is true color image from Google Earth. 
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(NED), have been used as the ground surface height to extract vegeta- 
tion height from the elevation of stereo imagery (Ni, Sun, & Ranson, 
2013). As described in the previous two sections, the geocoding accura- 
cy and elevation of stereo imagery data are inevitably affected by the ac- 
curacy of RPC. Therefore, the synthesized point cloud should be firstly 
co-registered to NED using Eq. (4) as in our previous studies (Ni et al., 
in press-a). Then the elevation divergence can be removed by low- 
spatial filtering as described in the above section. By these two steps 
the synthesized point clouds should be accurately co-registered and 
rectified to the other-source DEM and the heights of these points 
above the DEM can be generated. 

3.3. Analysis scheme 

Fig. 1 shows the spatial coverage of the related dataset within the 
test site. The features of point clouds from ALOS/PRISM were ex- 
plored by comparisons with LV1S waveforms and G-LiHT point 
clouds over the footprint of LV1S (20 m). The resolution of products 
from the stereo processing of PRISM data is 3 m. The pixels located 
within a 7 x 7 (i.e. 21 m x 21 m) window centered in an LVIS foot- 
print were used to make the comparisons. In the best cases (all pixels 
have matching points in the stereo imagery), there will be 147 points 
within this 7x7 window (49 points in each view). In the vertical dis- 
tribution histogram of the PRISM point cloud, the bin was set to 1 m. 
The bin size for the vertical distribution histogram of the G-LiHT 
point cloud was set to 30 cm which is the same as the bin size of 
LVIS (Sun et al., 2008). 

Averaging the heights of all points within a resolution grid of 
30 m produced an elevation map from ALOS/PRISM. A canopy height 
map was further generated from the elevation map of ALOS/PRISM 
by taking the NED as the ground surface elevation as in many other 
studies (Kellndorfer et al., 2004). The comparisons between the can- 
opy height map and RH50 of LVIS data at a resolution of 30 m were 
also made because RH50 is a good indicator of forest biomass 
(Huang et al., 2013). 

4. Results 

4.1. Removal of elevation divergences 

As described in the previous section, the RFM model (i.e. RPC) of 
each view was reconstructed using ground control points. The same 
set of ground control points was used for all three views to minimize 
the influence of ground control points on the data synthesis. There 
were 1768 points selected by the automatic image matching with accu- 
racy better than 0.1 pixels. Table 1 lists the estimated imaging geometry 
of the three views and their corresponding accuracy. The accuracy was 
depicted by the root mean square errors of predicted image coordinates. 

Fig. 2(a)-(c) shows the elevation difference between FB and FN (a), 
NB and FN (b) as well as between FN and NED (c) after they were accu- 
rately co-registered with each other. The high-frequency spatial chang- 
es in (a)-(c) are caused by local objects such as trees seen from different 
directions, while the low-frequency changes in (a)-(c) by errors of im- 
aging geometry used in stereo image processing which need to be re- 
moved. The low-pass filtering was applied to the resized elevation 
difference images with a factor of 0.1 to reduce the computation load. 
Since the original pixel size of PRISM elevation images is 3 m, the 


Table 1 

The estimated imaging geometry of different views using ground control points. 


View 

Imaging geometry (°, degree) 


Accuracy (in pixel) 


Roll (<p) 

Pitch (to) 

Yaw (hj 

RMSX 

RMSY 

Forward 

Nadir 

Backward 

-0.114778 

0.167656 

0.254939 

-23.996583 

0.092828 

24.228831 

-12.776417 

-14.512201 

-13.142084 

0.02743 

0.00470 

0.03632 

0.00748 

0.00811 

0.00794 


pixel size of images shown in Fig. 2 is about 30 m. Fig. 2(d)— (f) shows 
the changes of standard deviation with respect to the window size 
used in lowpass filtering, corresponding to Fig. 2(a)-(c) respectively. 
The window size of 71 x 71 was used for (a) and (b) and 181 x 181 
for (c) because the standard deviation become stable after these win- 
dow sizes (Ni et al., 2014b). Fig. 2(g)— (i) were the results of 
Fig. 2(a)-(c) after the removal of elevation divergences using their cor- 
responding window size. It was clear that the elevation divergences 
caused by the error of the RFM model was successfully removed. 

4.2. Analysis of point clouds 

Fig. 3(a) presents the horizontal distributions of point clouds from 
different view combinations as different colors. The Red, Green and 
Blue colors were assigned to pixels matched in stereo processing of 
NB, FN and FB combinations, respectively. If a pixel had no matching 
pixel its value was set to zero, so the dark pixel in Fig. 3 indicates that 
it could not be recognized from all stereo combinations while the 
white pixel indicated that it could be recognized in all of the three 
view combinations. The color of a pixel depends on which of the three 
view combinations was valid for calculating the height of the pixel. 
The colored image shows the spatial complementation of point clouds 
from multi-views. Fig. 3(b) and (c) are the enlargements over a lake 
area and clear-cut patches, respectively. The white lines that appear 
on the boundary of water bodies and shore, around clear cut patches, 
and roads indicate that object edges were well defined in the stereo 
imagery. 

The vertical lines evenly spaced across the epipolar images shown in 
this figure might be a specific effect of algorithms used in the searching 
of matching points by the software because no such feature existed in 
the stereo image pair. No visible effect was observed on the difference 
image of two DSMs as well as the final canopy height map from the ste- 
reo data. 

Fig. 4 compares the point clouds from ALOS/PRISM with the point 
cloud data from G-LiHT and LVIS waveforms over typical forest stands. 
Each row represents a forest stand. The middle column (e, f, g and h) 
is the canopy height model (CHM) from G-LiHT (21 x 21 grids) with a 
grid size of 1 m centered at a LVIS footprint. The spatial patterns of can- 
opy height shown in Fig. 4(e), (f), (g) and (h) indicate that these four 
cases represent 1 ) two-layer dense forest (at elevations of -90 m and 
100 m), 2) disturbed forest (tall canopy at elevation of -124 m 
with gaps at -108 m), 3) sparse forest (several small trees with heights 
6-9 m) and 4) uniform dense forest (no gaps, the ground elevation at this 
spot is -90 m, so the tree height ranges from 10 to 20 m), respectively. 

The left column (a, b, c and d) compares the vertical histograms of 
point clouds from PRISM (red — NB, yellow — FN, blue — FB), verti- 
cal height profile (histogram) of G-LiHT PC data (green curve) in the 
21 m x 21 m grid, LVIS waveform (pink curve), and shows the elevation 
of NED, mean PRISM, and LVIS height metrics (ground, RH50 and 
RH100) by horizontal lines of solid black, dotted black, solid red, thin 
solid green and thin dotted green, respectively. The horizontal axis of 
the plots in this column is the DN of the LVIS waveform. The point 
number in each bin of ALOS/PRISM and G-LiHT data were scaled using 
their corresponding ratios shown in the plots to make the peaks of 
PRISM, G-LiHT histograms the same as the LVIS data. The number of 
points from NB, FB and FN was plotted as red, yellow and blue respec- 
tively to show the contributions from different view combinations. 
The text at the top-center of the plots shows the ID and shot number 
of the LVIS laser pulse, latitude/longitude of the LVIS footprint, and the 
line/sample of the LVIS center on the PRISM image. The text at the bot- 
tom of each plot shows ratios used to normalize the histograms of 
PRISM and G-LiHT, numbers of points of PRISM PC data in the 21 m by 
21 m window, the elevations of ground surface, RH50 and RH100 of 
the LVIS waveform, and the NED elevation, mean height of PRISM PC. 
Fig. 4 shows that the point clouds of ALOS/PRISM had different features 
over these four forest conditions. The first layer could be captured by the 
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Fig. 2. The removal of elevation divergence: (a) FB-FN; (b) NB-FN; (c) FN-NED; (d), (e) and (f) are the changes of standard deviation with the window size used in low-pass filtering of (a), 
(b) and (c) respectively, (g), (h) and (i) were the results of (a), (b) and (c) after the removal of elevation divergence respectively. 


point clouds of ALOS/PRISM as shown in Fig. 4(a) while the second layer 
could not be seen. The point clouds of ALOS/PRISM scattered from the 
forest canopy to the ground surface when the forest canopy was not 
closed as shown in Fig. 4(b). One common characteristics observed in 
Fig. 4(a), (b) and (c) was that point clouds of ALOS/PRISM could not 
capture the maximum height of forest stands as shown by G-LiHT. 
This feature could be an interpretation for the underestimation of forest 
height as reported in other studies. The maximum height (top of trees) 
could be missed because of the relatively large pixel size (2-3 m). Some 
outlier points above the maximum height could be observed in uniform 
forest areas as shown in Fig. 4(d). This phenomenon was expectable be- 
cause the identification of matching points was based on the automatic 
image matching along epipolar lines, which tends to fail in cases of low 
image texture. There were only 24 points recognized in stereo matching 
which was about 1/3 of those in the other three forest cases. This phe- 
nomenon also occurred on bare flat ground, which also had low texture. 
Another important phenomenon in all of these four cases was that the 
vertical distribution of point clouds from different view combinations 


did not coincide. The higher point was not seen from the combination 
of FB, but it is seen in combinations of NB and FN. This demonstrates 
the complementarity of point clouds from different view combinations. 
For all four forest cases shown here the mean height of PRISM point 
cloud data is greater than RH50 and less than RH100 of LVIS data. 
Since the RH100 represents the top canopy height and RH50 has been 
used successfully for forest biomass estimation, the comparisons of 
PRISM height with RH50 and RH100 will reveal the feature and useful- 
ness of the height information derived from PRISM PC data (see Fig. 5 
below). 

The right column (i, j, k and 1 in Fig. 4) compares the vertical profiles 
of point clouds with G-LiHT profiles over 100 m rectangles using the 
same center as Fig. 4(a), (b), (c) and (d), respectively. The horizontal 
axis of the third column is the number of points from ALOS/PRISM. 
The number of points from G-LiHT was normalized to the same range 
using a ratio of the peaks of the two histograms. As shown in these 
plots, the vertical distributions of point clouds over large spatial sales 
( 100 m) were similar to that at the scale of an LVIS footprint. The vertical 
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Fig. 3. The horizontal distribution of point clouds, (a) The image composed of point clouds from different view combinations (Red: NB, Green FN, Blue: FB); (b) the enlarged subset of (a) in 
the blue rectangle; (c) the enlarged subset of (a) in the red rectangle. 


distributions of point clouds from different view combinations over 
large spatial scales were similar for dense forest as shown in Fig. 4(i) 
and (1). However, it is quite different for more open forest canopies as 
shown in Fig. 4(j) and (k). For forested areas (Fig. 4(i), (j) and (1)) the 
ascending edge of vertical profiles of point clouds from ALOS/PR1SM re- 
sembles that of G-LiHT very well. This feature indicates that more infor- 
mation could be extracted from PRISM point cloud by constructing 
useful height indices as those from LiDAR point clouds. For dense cano- 
pies as shown in Fig. 4(i) and (1), the ground surface cannot be identified 
from PRISM PC data. 


4.3. Canopy height map 

Fig. 5(a) shows a subset of canopy height map at a resolution of 9 m 
derived by subtracting NED from the average height of point clouds of 
multi-view combinations. Fig. 5(b) is the panchromatic image from 
nadir view PRISM data covering the same area as Fig. 5(a). The spatial 
features of roads, forest edges, patches and strip cuts at different direc- 
tions shown in Fig. 5(b) are also clearly exhibited in Fig. 5(a). Fig. 5(c) 
shows the forest canopy height map of the entire PRISM scene at a res- 
olution of 30 m. The blue rectangle in Fig. 5(c) is the enlarged area in 
Fig. 5(a) and (b). The yellow polygon is the area covered by LVIS data. 
Fig. 5(d) is the enlargement of the yellow polygon area in Fig. 5(c). 
Fig. 5(e) is RH50 of LVIS data. The spatial pattern of Fig. 5(d) resembled 
that of Fig. 5(e) very well. Fig. 5(f) is the scatter plot of PRISM canopy 
height vs. LVIS RH50 shown in Fig. 5(d) and (e). The slope of the linear 
regression was 1.04 with R 2 of 0.74. Fig. 5(g) is the scatter plot of PRISM 
canopy height vs. LVIS RH100. The slope of the linear regression was 
0.67 with a R 2 of 0.57. Apparently, canopy height index (mean height 
of the point cloud) derived from the PRISM triplet and NED is higher 
than RH50 while lower than RH100 and has better correlation with 
RH50 than RH100. These results clearly demonstrated that canopy 
height index derived from PRISM triplet and NED had comparable infor- 
mation for forest biomass as RH50 of the LVIS data. 


5. Discussion 

In this study, the synthesis of point clouds from combinations of 
multi-view stereo imagery was conducted in epipolar space, as shown 
in Figs. 2 and 3, not in object space as those shown in Fig. 5(a) and (c) 
(geocoded). The processing in epipolar space enables the easy visualiza- 
tion and examination of the features of epipolar lines. Besides, the errors 
introduced by geocoding were excluded and the storage requirement 
was reduced by minimizing void pixels. 

Two steps of processing, i.e. the accurate co-registration and removal 
of elevation divergence, were used for the synthesis of point clouds 
from multiple combinations of multi-view stereo imagery. As noted in 
the previous section, the mis-registration and elevation divergence 
were caused by the error of RPC. Accurate co-registration was always 
necessary especially for high-resolution image. The removal of elevation 
divergence might not be necessary if the accuracy of RPC was high 
enough, which could be judged by whether systemic offsets, inclina- 
tions or periodic black/white patterns could be observed in the differ- 
ence image of two DSMs. 

Fig. 4(b), (c), (j) and (k) showed that some points near or on the 
ground surface might be identified from the point clouds of ALOS/ 
PRISM. This feature indicates that it might be possible to derive heights 
of sparse or unclosed forests solely based on point clouds of ALOS/ 
PRISM. However, it is difficult to derive the maximum tree height be- 
cause the canopy peak is always missed when the image resolution is 
not high enough. 

The mean height of PRISM PC in a grid (e.g. 21m shown in Fig. 4) is 
not the top canopy height (<RH100) and higher than the RH50. At a 
large grid, such as 1 00 m, the resemblance of the vertical distribution 
of ALOS/PRISM over dense forest with those of G-LiHT point clouds indi- 
cates that the vertical distribution of PRISM PC may be used to derive 
more height indices for characterization of forest spatial structure and 
carbon storage, which will be pursued in our future studies. 

The synthesis of point clouds of multiple views increases the density 
of point cloud data, which is important to form LiDAR-like data. The 
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Fig. 4. The vertical distribution of point clouds from ALOS/PR1SM over typical forest stands and comparison with that of G-LiHT data and LVIS waveform, (a), (b), (c), and (d) were vertical 
profiles of point clouds over four different forest types: two-layer dense forest, disturbed forest, sparse forest and uniform dense forest respectively, (e), (f), (g) and (h) were the canopy 
height models from G-LiHT with a resolution of 1 mover the footprint of (a), (b), (c), and (d) respectively; (i), (j), (k) and (1) were the vertical distribution of point clouds from ALOS/PR1SM 
as well as from G-LiHT over the 100 m x 100 m areas using the same center as (a) (b), (c) and (d) respectively; The ratios printed in plots are the ratios of the maximum bin value of LVIS 
returns to PRISM or G-LiHT. The “ground", “RH50” and “RH100” were from LVIS data. The “Mean” was the average of all points of ALOS/PR1SM within a pixel (21 m or 100 m). 


effects are more obvious for sparse and unclosed forests. This is especial- 
ly important for the mapping of forest structures using stereo imagery 
with relatively low spatial resolution such as PRISM data (2.5 m). 


Some abnormal points from ALOS/PR1SM point cloud data were ob- 
served over uniform forest areas. The influence of these points on the 
mean elevation was not severe but special care should be taken when 
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Fig. 5. The mapping of forest vertical structure using ALOS/PRISM data: (a) a subset of canopy height map at a resolution of 9 m; (b) a subset of a panchromatic image from PRISM nadir 
image covering the same area as (a); (c) the canopy height map at a resolution of 30 m; (d) the subset of (c) covered by LV1S data; (e) RH50 from LV1S data; (f) correlations between (d) 
and (e); (g) correlations between (d) and RH100. 
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other height indices need to be derived from the vertical profiles of 
point clouds. The image texture may provide some information for the 
identification of these special cases. 

A forest canopy height map was produced in this study from the syn- 
thesized point cloud data of ALOS/PR1SM and NED. Fig. 5 exhibited the 
potential for the mapping of forest biomass using ALOS/PR1SM and 
NED data. Studies by (Drake et al., 2002; Huang et al., 2013) have 
found that the RH50 has good linear correlation with biomass and it 
has been widely used for forest biomass estimation. Because less points 
near and at-ground surface were captured by ALOS/PR1SM than LVIS, 
the PRISM height is systematically higher than RH50 (a bias of 4.3 m 
in Fig. 5(f)). The high correlation between the average height of 
PRISM point cloud and the RH50 indicates that the height data from 
synthesized point clouds of ALOS/PRISM could be used for forest bio- 
mass estimation. 


6. Conclusion 

Significant progress in digital photogrammetry makes the data pro- 
cessing easier than ever before. The features of point clouds from stereo 
imagery over forested area were investigated in this study taking the 
point clouds from small footprint LiDAR data and LiDAR waveforms as ref- 
erences. A method for the synergy of point clouds from multi-views was 
proposed and validated. Then the features of synthesized point clouds 
were examined. The results showed that the point clouds from different 
view combinations were complementary to each other spatially. A forest 
canopy height map was produced by subtracting NED from the averaged 
elevation of synthesized point clouds from stereo imagery. The canopy 
height map exhibited specific features of roads, forest edges and patches. 
The spatial pattern of canopy height map resembles that of LVIS RH50 
very well. The linear regression showed that the canopy height 
map had a good correlation with RH50 of LVIS data with slope = 1.04 


and R 2 = 0.74. This indicates that the combination of stereo imagery 
with NED has potential for mapping forest biomass. 

The stereo imagery could see both canopy and ground surfaces in 
unclosed forest but only the canopy surface in the dense forest. Only 
the top layer could be captured for multi-layered dense forest. Due to 
obscured ground surface in the point clouds of stereo imagery, a positive 
bias was observed in comparison of PRISM height with LVIS RH50. The 
comparison with RH 1 00 indicates that the PRISM height has some infor- 
mation on the top canopy height too. The point cloud from ALOS/PRISM 
exhibited interesting features in detecting forest vertical structures, 
which will be further explored in future studies. 
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